Use pass@1 for all evals #633

lewtun · 2025-05-05T14:32:54Z

This PR bumps lighteval to use pass@1 metrics for all evals, with n > 1 samples per prompt to mitigate variance from repeated runs:

AIME24/25: 64 samples per prompt
GPQA: 8 samples per prompt
MATH_500: 4 samples per prompt

See huggingface/lighteval#698 for more details.

Note that I've updated the leaderboard to include the n values per benchmark to avoid breaking backwards compatibility:

TODO

Update README with new eval scores

lewtun added 2 commits May 5, 2025 14:30

Use pass@1 for all evals

48ec742

Update scores

23cc5fe

lewtun changed the title ~~[WIP] Use pass@1 for all evals~~ Use pass@1 for all evals May 7, 2025

lewtun requested a review from edbeeching May 7, 2025 08:53

Merge branch 'main' into pass_at_1_with_large_n

91f4d8d

edbeeching approved these changes May 9, 2025

View reviewed changes

lewtun merged commit c802f00 into main May 9, 2025
1 check passed

lewtun deleted the pass_at_1_with_large_n branch May 9, 2025 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use pass@1 for all evals #633

Use pass@1 for all evals #633

lewtun commented May 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use pass@1 for all evals #633

Use pass@1 for all evals #633

Conversation

lewtun commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lewtun commented May 5, 2025 •

edited

Loading